Entry Name:  "UBA-Bel-MC2"

VAST Challenge 2015
Mini-Challenge 2

 

 

Team Members:

Martín Bel, University of Buenos Aires, m4rbel@gmail.com    PRIMARY

Nadia Romano, University of Buenos Aires, nadia.romano@gmail.com
Delia Balaoi, University of Buenos Aires,  delia.balaoi@gmail.com

 

Student Team: YES

Did you use data from both mini-challenges? NO

 

Analytic Tools Used:

Tableau

The following open source packages/libraries

R – data.table:  M Dowle, T Short, S Lianoglou, A Srinivasan with contributions from R Saporta, E Antonyan

R – ggplot2: H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.

Igraph - Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.org

R – Shiny: http://shiny.rstudio.com

Plot.ly – D3 based library used to make ggplot2 R objects interactive

Dygraphs.js

We have also used MySQL as a database for part of the analysis.

 

 

Approximately how many hours were spent working on this submission in total?

100 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES

 

Video Download

Video:

http://www.youtube.com/watch?v=Kw8Z3HLBJ6Q

To download: https://drive.google.com/file/d/0B27xCoMNYGfVbGxpQl9UNDBIS3c/view

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1Identify those IDs that stand out for their large volumes of communication.  For each of these IDs

 

      a.         Characterize the communication patterns you see.

      b.        Based on these patterns, what do you hypothesize about these IDs?

 

Limit your response to no more than 4 images and 300 words.

 

 

The IDs that stand out for their large volume of communications represent a 12,08% of the total volume of the incoming and outbound messages. These IDs are: 1278894 and 839736.

We have computed the amount of outbound communications using a five minute interval. And plotted the amount of communications by day and id as represented in Figure 1.

 

 

Descripción: 1_outbound_messages_hv.png

Figure 1: Outbound messages from high volume IDs. Interactive version

 

As we can see in Figure 1 there is a sharp increase in the amount of outbound messages from the ID 839736 around 12:00 p.m. on Sunday. And, a second and smaller peak at 14 p.m. hs. This ID sends messages from Entry Corridor to users. It sends these messages every minute of the day. Our hypothesis is that this is a bot from the Park’s App that tries to find the location of every user.

The second ID, 1278894, ( see Figure 2) presents a regular behaviour and sends messages from Entry Corridor. This ID sends messages in a 5 minute interval at specific hours: 12, 14, 16, 18 and 20 hs. The only peculiarity we could find is that between 14:40 p.m. and 15:00 p.m. there is a decrease in the amount of messages sent on Friday and Saturday, whereas in Sunday there is no such drop. However, this behaviour doesn't seem to be relevant to understanding the crime.

 

 

Descripción: 278894.png

Figure 2:  Outbound messages from 1278894

 

So far we’ve analysed outbound messages from the highest volume IDs. In this section, we’ll focus on incoming messages. We computed the amount of messages received by ID 839736 on Sunday grouping by location and using a 5 minute range. See Figure 3.

 

 

Descripción: 2_incoming_messages_hv.png

          Figure 3: Incoming messages received by the ID 839736. Interactive version

 

As expected, there is a high peak of incoming messages at 12 p.m. coming from Wet Land. We found a second peculiarity. At 15 p.m., there is another peak coming from users in Coaster Alley. If we compare the volume of incoming messages to this ID, Wet Land appears as the higher source of communications.

 

From the previous analysis, we can assert that a relevant event took place around 12 a.m. in Wet Land. Considering Wet Land is an area located where the show took place, we believe this is the moment the park authorities have detected the crime.

 

 

 

MC2.2Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.

Limit your response to no more than 10 images and 1000 words.

 

Based on our previous analysis we hypothesize that the crime was detected around 12 p.m. on Sunday. So now we’ll focus on the communication patterns that were found at this time.

When looking at the total amount of communications per day, hour and location we find a sharp increase in communications in Wet Land and Entry Corridor compared to previous days. It is also possible to see a clear drop in the messages sent from Wet Land comparing the amount of observations on Sunday versus Friday and saturday after 12 p.m., meaning that the area near the show was probably closed shortly after 12 p.m.

 

 

Descripción: https://lh5.googleusercontent.com/aEXL1uDz7_e-1cs0QZnqTsmOuVjhVfA4T5SROLkzGNWnCYRHfiuTlhhzp4OtAJtsdWn2fPre6VT1-9TsBHksy3mJEho24bTSLY5PEy0kYHh2AIAZVlnlHHkvA7Egh1eIeAkhQo0

Figure 4: Total messages sent and received for each location. Interactive version

 

We can also see a peak in the number of outbound messages from Entry Corridor comparing to the previous days and a decrease in Wet land after the peak.

Now, we will take a closer look at who is communicating with whom around this time.  When analyzing the IDs that communicated with external individuals, we found a few outliers. Around the time of the crime there is a peak in the amount of messages sent from Wet Land.

 

 

Descripción: 3_messages_location.png

Figure 5: Total messages sent to External for each location. Interactive version

 

From the above images we hypothesize that the incident was discovered around 12 p.m. So, let’s zoom into the IDs that were more active at 11 a.m. These IDs present little amount of communications before 11 a.m., however at around 11:30 a.m. the amount of messages increase significantly. It’s possible to see a very strong activity from 11:30 to 11:47. After this time, two interesting facts can be spotted. One,  activity drops abruptly. Second, all of the messages sent from these IDs were sent to external numbers after 11:47. Based on this pattern, we believe that these IDs were involved in the incident . See Figure 6.

 

 

Descripción: ost active at 11short.png

Figure 6:  Active IDs at 11 am  Interactive Version

 

Now, we’ll focus on what happened after the incident. At 12 p.m., there is another peak of communications from Wet Land. Our hypothesis is that the incident was discovered around this time. We filtered those IDs with the highest level of activity at 12 p.m.. And, found 7 IDs that stand out from the rest. These IDs started sending messages to ID 839736 every minute until a few minutes after 12:30 p.m. See Figure 7

From these there are two IDs 38945 and 95112, which sent messages to  ID 1278894 every few minutes as well. The IDs 731443 and 947320 communicated as well with 36486. The only ones that didn’t send messages to a different ID other than 839736 and 1278894 were 1092525 and 1601276. All of these IDs dropped communications to almost zero after 12:30 p.m. This group is probably part of the Park’s security team.

 

Descripción: ost_active_12.png

Figure 7:Active IDs at 12 p.m.  Interactive version

 

We’ve identified the potential sources of disturbs (most active users at 11 from Wet Land - Figure 6) and the potential members of the park’s security (most active users at 12 from Wet Land - Figure 7). Our next step, is to try to find out if there is any evidence on how the vandalism was planned.

In order to do so, we identified those IDs that visited the park in both Friday and Sunday and  communicated only to External sources (Figure 8). IDs  554218, 107490, 1685871 communicate from areas surrounding Wet Land and the road from the entrance towards Wet Land and Costey Alley. Our suspicion is that at least one of these 3 individuals came to the park on Friday to planify Sunday’s attack. ID 1965716 doesn’t seem suspicious because on Sunday, he sent messages  only from Kiddie Land.



Descripción: https://lh6.googleusercontent.com/TEbrf_iDAYNbvaK7akLDEcp-sUVKzqfwmU5z4rlgh6jnT91t2rAr5tSoNewrVSgghcTmoNLCJSWqCnAk220oB3AMEFLu7AFZjfO-gAEsWSM83cBo3TVrjgvs_3DiC5trTJ_Fn4Y

Figure 8: IDs that send message only to External during both days Friday and Sunday  Interactive version

 

On Saturday and Sunday we found ten IDs that sent messages only to External sources. The possible mastermind(s) of the crime could have been at Wet Land and Coster Alley one day before the crime at a similar or close range of hours to planify the attack (IDs 708696 and 2030671). See Figure 9.

 

 

Descripción: https://lh4.googleusercontent.com/ZSKURuCRvGZWie5Av3SL2gqOlQi2k__tjASFFTsGmF5Azua2K-B7lIWM--R-tvOlw01mKCdi8vTkQdjqhH-FKF_4Nc7d7OYVo_Oca6NxBQCEYHWCTzWlyjJE1loTlXsW4br6SX8

Figure 9: IDs that send message only to External during both days Saturday and Sunday Interactive version

 

Another pattern was detected when inspecting the amount of people that sent messages to groups as opposed to individuals. In Figure 10, the amount of messages that were sent to groups around 11:45 a.m. - 12:30 a.m. on Sunday is much higher than the rest of the days. It seems reasonable to think that during this time lapse an extraordinary event occurred, triggering group messages as a signal of alert from individuals towards their family, friends or coworkers. It’s interesting to note how the value falls and rises back 15 minutes later and then again at 14:00

 

Descripción: https://lh6.googleusercontent.com/7I_d785r_Qc71R8p3AidVoUW5Vr7xaeuzzLH-97OLPivtVx_05n8mrMvWKE5kMLoXVVSI5TfS7b1S7pgNrhJfOvF70uhFeIAY_Ij4RHEf7dPwG6SxkX2AJ9muprFjsDdklLVWfs

Figure 10. Total amount of IDs that send messages to groups   Interactive version

 

Figure 11 shows the average amount of time between messages per hour at the different locations of the park. During this time range, all of the averages behave similarly except for  Wet Land. At the period 10 a.m.-13 p.m., the average is significantly lower than the average from the rest of the park. As the graph indicates time between messages, this lower average implies a significantly higher frequency of messages during those hours in the Wet land area.

 

Descripción: https://lh6.googleusercontent.com/GND15_WkL4gKquHlYEU2JVQ-1zxVL_0dsWiBg9Ff0MC0kTuCt9IEQ1wHFo7o6yvqKf1nhuVFZBliPhOffmO5og3nE7oz3dgm0hJts-vUCmUOyUZjGHWptufRhVzs2D1otVI_Jxc

Figure 11. Average amount of time between messages Interactive version

 

Analysing the amount of messages sent by visitors ( Figure 12) that were present the 3 days, we noticed that on Friday people communicated more from Coaster Alley, Entry Corridor and Kiddie Land than in the rest of the days. On the other hand, on Sunday after the peak from 11:45 a.m.,  communications dropped in Wet Land, probably because of the  Pavilion closure. In Tundra Land, a location that is located right next to Wet Land, they increased. Our hypothesis is that after the incident, some parts of Wet Land were closed and people moved to Tundra Land.

 

 

Descripción: https://lh3.googleusercontent.com/3MFimWY7jC2dhQuVBtZD98T8PZ3s4jTnO6xiXmjTN7tznQkIuV5ESUKKIqbcEoOqXZPqcAA69eFq6rKwN3KOoAIPkchDR-1qIfE5hOtGuHe7300Ud7eu_LapVJmVrl75o7ibEBU

Figure 12. Amount of messages sent by 3 days common IDs Interactive version

 

If we focus on the 10:00 a.m. - 15:00 p.m. interval  on Sunday (in Figure 13), we can notice that around 11:45 a.m.  the amount of messages increased in Wet Land. Immediately after, around 12:00 p.m. PM,  it started  to increase also in Entry Corridor. In both areas the amount of messages dropped close to 12:30 p.m.. We assume that the vandalism from Creighton Pavilion was noticed by the people nearby who started to alert their groups and also security.  Around 11.00, there was a peak in Coaster Alley as well, but we assume is not related to the crime as the biggest quantity of those messages was sent to people from the same area.

 

 

Descripción: https://lh4.googleusercontent.com/g2pbZoN7nQz6SmuA4GVyGVSFfFok_ynACx2syZDywDCkM3iRi7d99uemDpQTExFBqJoHFdXddEB4_77_PXvlgYEIb4MJA_E78SznNACwaMV1XCcIk39dKU1lJDMeVXtg0zKZzWc

 Figure 13. Amount of messages sent by 3 days common IDs - 10:00 - 15:00 range Interactive version

 

 

MC2.3From this data, can you hypothesize when the crime was discovered?  Describe your rationale.

Limit your response to no more than 3 images and 300 words

 

From the analysis of the high volume ids in the first section, we believe the park authorities communicated people in the park around 12.00 a.m. there was a problem in Wet Land.

We also  found there were other patterns that show high amounts of communication between 11:30 a.m. and 11:45 a.m. For example in Figure 14 there is a strong peak in the amount of  messages sent from Wet Land while the amount of unique IDs that send them grows at a constant rate. This reveals that some people started communicating more heavily.

 

 

Descripción: https://lh4.googleusercontent.com/jvBTn9ncM3wOsz2wrnk4l1dMW7P9CqLaaoxJWz5eN7N_B9L97pbgFhwCDtB32QPmRuy1yMXNTAl-nj2H2_Blxoa38PY_ZXXWqbZUOKT3EyihbAPPsme0h-vysMp94fnHCGLGtWc

Figure 14. Amount of unique ids versus total amount of messages sent. Interactive version

 

In order to understand when the event was discovered, we looked into centrality metrics. Specifically we traced its evolution. Figure 15 shows the following metrics: global clustering coefficient, mean betweenness and Maximum degree. These metrics were computed with a one minute window for each timestamp revealing changes in the network structure.

The clustering coefficient plot shows a similar pattern as found in Wet Land. Such a drastic increase implies that people became heavily connected to their neighbours around the time of the incident. And, that the connectivity of the network reached a maximum at around 12:00 p.m. (see peaks in betweenness and maximum network degree)

 

 

 

Descripción: https://lh6.googleusercontent.com/VA5fs8azKfIlcuvJHYKfEzYJucG3iHjNpZu1FygiAKJauhldTKyVfBB7dgv2ot1jkvHjjz2CVEUIAMvXzzvKAYDpjAGTGUiaXE_VxIrE9YiN3EpkaHAe32lhX4GMbgZAf0oNLzE

Figure 15. Centrality metrics computed with a one minute window for each timestamp on sunday between 11:00 and 1 p.m. Interactive version

 

The two remaining plots, reveal a similar patterns to those present in Fig.1. We know that a singular event took place around 12 a.m,  we know as well that it involved large amounts of communications between Entry Corridor and Wet Land. And that the high volume Ids are accountable for most of the traffic.

With these considerations in mind, we believe that the incident took place between 11:30 a.m and 11:45 p.m.